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ABSTRACT 

We calculate an empirical, non-parametric estimate of the shape of the radius 
distribution of small planets with periods less than 90 days using the small yet 
well-characterized sample of cool (T e g < 4000K) dwarf stars in the Kepler catalog. 
Using a new technique we call a modified kernel density estimator (MKDE) and 
carefully correcting for incompleteness, we show that planets with radii ^1.25 Rq 
are the most common planets around these stars. An apparent overabundance 
of planets with radii 2-2.5 R® may be evidence for a population of planets with 
H/Hc atmospheres. Lastly, the sharp rise in the radius distribution from ~4 Rq 
to 2 i?^ implies that a large number of planets await discovery around cool dwarfs 
as the sensitivities of ground-based surveys increase. The radius distribution will 
continue to be tested with future Kepler results, but the features reported herein 
are robust features of the current dataset and thus invite theoretical explanation 
in the context of planetary system formation and evolution around cool stars. 



1. INTRODUCTION 



The discovery of the first exoplanets flWol- 
szczan & Frailj [19921 |Mayor fe Queloz|P^T 



Marcy fc Butler|1996 l has sparked tremendous 
growth in research and interest in the formation 
and evolution of planetary systems beyond the 
Solar System. Not unlike many areas of as- 
tronomy, however, the first discoveries are not 
representative sam ples; rather, "hot Jupiters" 
are relatively rare ( Wright et al.|[2012 Howard 
et al. 2010) in comparison to the new popuia- 
tions of exoplan ets now being revealed by the 
Kepler Mis sion (Borucki et al. 2011 Batalha 

The most common 



et al.||2012[ [Burke 
kinds of planets wit 



2013) 

lin Kepler's discovery space 



°f Rp It and P < 100 d appear to be 

somewhat larger than Ear th but smaller than 
Neptune, 1 < R v < 4R ffi ([Howard et al.||2012 



Fressin et al. 
2013). 



2013 Dressing fc (Jharbonneau 



Much of our understanding of planet forma- 
tion is anchored in decades of research into our 
own solar system. But now the burgeoning ex- 
oplanet population provides us with a new con- 
text revealing important insights into planet 
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formation throughout the Galaxy. For exam- 
ple, the large amount of planetary mass seen 
close to host stars is evidence that protoplane- 
tary disks may have much hig her surface den 
sities tha n previously thought ([Hansen fc Mur- 
ray|2012| [Chiang fc . Laughlin|20rZ[ ) or that the' 



observed planets migrated from regions further 
from their host star where more mass was re ad 
ily available for assembly ( Swift et al.||2013 ). 

The multi-transit systems of the Kepler sam 
pie also offer a wealth of inform ation regarding 
their formation and evoluti on (Lissauer et al. 



2011[|Fabrycky et al |2012b| ). The period ratios 
of planets within a given system show a propen- 
sity to lie just o utside of first order mean mo- 
tion resonances ( Fabrycky et al.|[2012a Steffen 
|et al.|[2013]), which may be an imprint of dissi 



pative 



ItJhwick & Wu 2012 Batygin & Mor- 



bidclli 20131 or stochastic mechanisms (|Rein 



2012 ) m the formation or evolution of planetary 



systems. The low inferred mutu al inclination of 
multi-transit systems (~ l°- 3° ; |Fabrycky et aL 
2012b||Fang fc Margot|2012[ ) together with the' 



relative number ot single versus multi-transit 
systems provides constraints on the number of 
planets in a given system within Kepler's dis- 
covery window, else it may be the first indica- 
tion of a separate, high-inclination population 



2 




and possible turnover in the log-binned his- 
togram of detected planet candidates some- 
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Fig. 1. — Evidence supporting the hypothesis that 
small planets are incomplete in the Cool KOI sample. 
The solid black line is the observed (smoothed) distribu- 
tion of planets smaller than 1 R^ ; the grey shaded area 
is the observed period distribution of all the Cool KOIs. 
(Neither distribution is corrected for transit probabil- 
ity.) The vertical dashed red line indicates the period 
at which a 1 Rq planet around a 0.5 Rq star (typical 
of the Cool KOI sample) would have SNR of 7.1, the 
nominal detection threshold for KOI identification. The 
lack of observed small planets at periods longer than 10 
days is thus very plausibly due to incompleteness. 



of single transit systems (Hansen & Murray 
2013||Fang fc Margot|2012[ ). Lastly, the mutual 
gravitational interactions within some multi- 
planet s ystems offer an estim ation of the planet 
masses ( Lithwick et al.p012[ ) that then inform 
planetary com positions and atmospheric evolu- 
tion scenarios (IRogers et al.||2011[ |Wu fc Lith- 
wick||2012| [Lopez et al.|[M2| ~ 



in this article we focus on yet another impor- 
tant clue regarding the formation of the com- 
pact systems revealed by Kepler: the distribu- 
tion of planetary radii. The initial estimates 
of the planet radius distribution by Howard et 
al. (2012) showed a dramatic increase in the 
number of planets at ever smaller size. Cit- 
ing incompleteness, however, they did not fol- 
low this trend in their analysis to planet radii 
sm aller than 2 In an independent study 

by Youdin ( |2011[ ), a parametric estimation of 
the planetary distribution function revealed a 
deficit of large planets in short period orbits 
that would support a core accretion then mi- 
gration formation scenario. 

More recent estimates of the planet radius 
distribution show a preferential size scale in 
the Kepler sample indicated by a flattening 



where around 2 (|Fressin et al.||2013[ [Dress- 



ing fc Charbonneau||2013J "fPctigura Marcy 
2013p . If true, this would be an important 
clue toward understanding the key mechanisms 
that shape the observed population of compact 
planetary systems that pervade the Galaxy. 
However, these analyses are constrained by the 
limitations of coarse histograms; no analysis to 
date has yet characterized the shape of the ex- 
oplanet radius distribution in enough detail to 
allow meaningful comparison to planet forma- 
tion and evolution theories. 

With this in our sights, we focus on the 
smallest stars in the Kepler Object of Inter- 
est (KOI) sample for two reasons: (1) between 
th e spectroscopic studies of the M dwarf sample 
by Muirhead et al. (2012b I and the photomet- 
ric re-calibration of the Kep ler Input Catalog 
(KIC) for the co olest stars by Dressing & Char 
bonneau (2013) the "Cool KOIs" constitute a 
well characterized sample, and (2) since a tran- 
sit signal is proportional to the square of the 
planet to star radius ratio, the Cool KOIs are 
optimal for probing the planet radius distribu- 
tion for the smallest planet sizes. 

The particular goal of this work is to derive 
the shape of the planet radius function properly 
marginalized over orbital period. Figure [T] illus- 
trates why this is an issue: the smallest plan- 
ets are not complete out to the same orbital 
periods as larger planets; thus careful correc- 
tion is required in order to achieve this goal. 
These concepts of incompleteness and period 
bias were studied i n detail early in the history 



et al. 2003 Gaudi 



of transit surveys ( [Pepper et al. 2003 Gould 

but have yet to be 



Kepler data. 



applied in detail to 

In fp] we walk through the steps required to 
properly extract a non-parametric empirical es- 
timate of the true planet radius function given 
a population detected in a well-characterized 
transit survey. In we apply these methods 
to the Cool KOIs toaerive the radius distribu- 
tion for small planets around small stars. We 
explore the various assumptions that go into 
this calculation in Sj5j and conclude in f|6] 



2. FORMALISM 
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Wc define the planet radius distribution func- 



tion c/)^ max (r) such that 

IrZ7 €—{r)dr = NPPS, P < P n - 



(1) 



that is, a density function with an overall nor- 
malization giving the average number of plan- 
ets per star (NPPS) for planets with period less 
than P max days, for planet radii r between r m i n 



and 



The problem of calculating planet 



occurrence rates from Kepler has bee n quite 
an industry over the last few years flYoudin| 
20TTj |Howard et al.||2012| |Dong fc Zhu[|2012; 



Swift et al.||20131 |Fressin et al.| 20131 |Petigura| 
fc Marcy|2013[ pressing fc Charbonneau|2013| ). 
However, there has been little quantitative dis- 
cussion of deriving the detailed shape of the 
radius function beyond drawing histograms. In 
the following subsections, we review and refine 
the general principles of an occurrence calcula- 
tion and then describe how to follow these prin- 
ciples to construct a non-parametric empirical 
radius function that obeys the above desired 
properties. 

2.1. Occurrence Calculations 

In a perfectly idealized survey that is both 
100% reliable and 100% complete, the occur- 
rence rate of planets is simply 



NPPS = 



(2) 



where N p is the number of detected planets and 
N-tt is the number of stars surveyed. In prac- 
tice, however, this must be corrected for both 
incompleteness and unreliability as follows: 



1 



NPPS = ]v-2^ 



(3) 



Here the sum is over all detections and Wi 
is a weighting factor applied individually to 
account for the various necessary corrections. 
Generally, these weights can be thought of as 



(1 - FPP,) 



m 



(4) 



where FPP^ is the probability that signal i is 
a false positive and r\i is an individualized effi- 
ciency factor for the detection of planet i. In 



this work, we do not incorporate the calcula- 
tions of FPPi in detail, as the a prior i false pos- 
itive rate amo ng candidates is low ( |Morton fc 
Johnson 2011), and ongoing analysis (Morton 
et al., m prep) according to the false po sitive- 
calculating procedure of Morton (20121 indi- 
cates the false positive rate in this particular 
sample is negligibly low. 

We thus focus on the detection efficiency r/i, 
which is defined by the following thought exper- 
iment: // a very large number of planets identi- 
cal to planet i were distributed randomly around 
all the stars in the survey, only a fraction rji 
could have been de tected. This c an be further 
factored (following Youdin 2011): 

Vi = Vtr,i ■ ^disc.i: (5) 

where ?ytr is the geometric transit probability, 
and ?7disc is the "discovery efficiency" : the frac- 
tion of planets in this thought experiment with 
transiting orbital geometries that could have 
been detected by the survey. In previous Ke- 
pler o ccurrence rate calculations dHoward et al. 
20121 [Swift et al.||2013l |Dressing fc (Jharbon- 
neau||2013p , this factor has been defined as 



iV*,i 



(6) 



where N+j is the number of target stars around 
which planet i could have been detected. As we 
show, however, this is not sufficient to properly 
characterize ?7disc,i; more care must be taken. 

Central to correctly calculating rydisc is the 
fact that whatever transit detection pipeline is 
used, it is incomplete, especially near the de- 
tection threshold. This pipeline incompleteness 
must be carefully considered in an y occurrence 
rate ca l culati on. The analysis of Peti gura "fc] 
Marcy ( |2013| ) is a model of one way this can 



be done: simulating planets throughout the ra- 
dius and period parameter space considered in 
order to directly measure ??disc as a function of 
planet radius and period. 

A more general conceptual way to attack this 
problem — and the only one available if one is 
relying on the results of someone else's detec- 
tion pipeline — is to assume that the detection 
efficiency of any pipeline is a function only 
of the signal-to-noise ratio (SNR) of the tran- 
sit signal. This w as the approach taken by 
I Fressin et al.| (|2013b, w ho determined that for 
the |rjatalha et al.| ( |2012[ ) Q1-Q6 catalog, the de- 
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tection efficiency of the Kepler pipeline could 
be modeled by a continuous "SNR ramp" func- 
tion of the following form: 

77 SNR (SNR) = 0; SNR < S 

= linear; S < SNR < Si 
= 1; SNR > Si, (7) 



where in |Batalha et al. p012) So = 6 and 
Si = 16. This is notably different from a sharp 
detection threshold at SNR = 7.1, which was 



used by both Swift et al. (20131 and Dressing 
& Charbonneau 1 2013 ) m their occurrence cal- 
culations. The newly released and more uni- 
formly vetted Q1-Q8 KOI catalog, currently 
hosted at the NASA Exoplanet Archive, is 
better characterized by a steeper SNR ramp 
(F. Fressi n, priv. comm .), as it is more com- 



plete than Batalha et al. ( 2012 ); thus, we adopt 
an SNR ramp where So = b and Si = 12. 

With this function defined, ?7disc,i can then be 
calculated by the following procedure: simulate 
planet i around every star in the survey, each 
of which has both a different radius and dif- 
ferent photometric noise properties, to obtain 
a (normalized) distribution of SNRs 0snr,» for 
that planet, and then marginalize the detection 
efficiency over that distribution: 



Vsnr(s) ■ (l>SNR,i(s)ds. (8) 



An extremely important consideration in this 
procedure of constructing </>snr,i is properly 
treating orbital period. Remember, the ulti- 
mate goal of this analysis is to calculate the dis- 
tribution of planet radii with period less than 
Pmax- 0^ max - Thus, in simulating the popu- 
lation of planet i clones around other stars, it 
is important to assign each of these clones an 
orbital period P < P max according to a reason- 
able estimate of the true planet period distri- 
bution (pp. 

Distributing the hypothetical planets accord- 
ing to a period distribution is crucial because 
both SNR and planet occurrence are functions 
of orbital period. For example, imagine that a 
survey of 1000 stars detects one 0.5 ii® planet 
in a 1-day orbit with low SNR (e.g. SNR = 10). 
An occurrence analysis in the style of Equa- 
tion [6] might conclude that this planet had a 
20% transit probability and would have been 
detectable around only half the stars in the sur- 



vey, thus giving it a weight factor of Wi = 10 
and leading to the conclusion that 0.5 i£® plan- 
ets are rare, only existing around 1% of stars. A 
slightly more sophisticated analysis might note 
that 77snr(10) = 0.4, and give another factor of 
2.5 boost, concluding that planets of this size 
exist around only 2.5% of stars. 

However, this conclusion would still be incor- 
rect, since it does not account for the fact that 
planets of this size may only be detectable at 
very short periods. What if only a very small 
fraction of all planets happen to have periods 
as short as 1 day? In this case, the supposed 
rarity of 0.5 i?® planets would be just a mis- 
interpretation of the fact that planets with 1- 
day orbits are rare. The small planets that 
doubtless still do exist at larger orbital peri- 
ods have not been detected, and no correction 
has been made to account for this. When con- 
structing 0sNR.i, distributing the hypothetical 
planets according to a reasonable period distri- 
bution will avoid this misdiagnosi s. This was 



not done in either Swift et al. ( 2013[ ) or pressing 
& Charbonneau Q2013P , leading both analyses 
to underestimate the occurrence rate of small 
planets around small stars. In f|3]we show this 
also makes a qualitative difference in the inter- 
pretation of the planet radius function. 

2.2. Estimating the Radius Distribution 
Function 

In all the Kepler planet occurrence calcula- 
tions to date, the shape of the radius func- 
tion has been explored only very coarsely, by 
calculating the occurrence rate in several dif- 
ferent radius bins and either fitting a power 
law or qualitatively commenting on the shape. 
Howard et al. ( 2012 \ found a good fit to an R~ 2 
power law down to 2 i?® , and declined to com- 
ment for smal l er plan ets. On the other hand, 



Fressin et al. 



(2013) note t 



([2013} and jPetigura fc Marcy 
lat the occurrence rate of plan 
ets increases towards smaller radius but then 
appears to flatten out below about 2.8 i?®. 
Dressing & Charbonneau (2013) claim that the 
occurrence rate begins to decrease for planets 
smaller than 1-1.4 i?®. 

Investigating the shape of the radius distri- 
bution in more detail requires a non-parametric 
approach, and also should avoid binning. Here 
we introduce the concept of a modified kernel 
density estimator (MKDE) in order to accom- 
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plish this. 

A standard kernel density estimator (KDE) 
attempts to estimate the true underlying prob- 
ability distribution of a sample of data points 
using a function of the following form: 



1 N 



(9) 



where N is the number of data points and 
k(x) is a zero-mean, normalized kernel func- 
tion of arbitrary shape (commonly a Gaussian) , 
with some width <jj, that most generally can 
be different for each data point. This creates 
a smooth distribution out of a discrete data 
set, with the degree of smoothness controlled 
by the width parameter. The choice of width 
has tradeoffs in both directions: if the kernels 
are too narrow the estimator will be bumpy, 
but if they are too wide they can wash out 
real structure in the distribution. Often the 
width is selected to be the same for all points 
based on the number of data points, or some- 
times a variable-width kernel is used, e.g. the 
distance to the nth nearest neighbor. The 1/N 
normalization factor assures that the integral of 
this density estimator over the whole parame- 
ter space is unity. 

In order to use the KDE concept to prop- 
erly reconstruct the radius function of planets 
detected in a transit survey, each data point 
has to be weighted appropriately, leading to a 
modified KDE, or MKDE: 

1 Np 

$r max (r) = jt- ^2wi-k(x-Xi;ai), (10) 
* i=l 

where Wi = l/f]i are the appropriately calcu- 
lated individual weight factors that renormal- 
ize the kernels to correct for missing planets, as 
discussed in §2.1| The weights ensure that the 
shape of the radius function responds appropri- 
ately to the individual corrections, and the iV* 
overall normalization ensures that the integral 
over all radii will return the NPPS, as desired 
in Equation [T] A very natural choice for the Oi 
in this case, which avoids having to choose an 
arbitrary smoothing factor, is the uncertainty 
in each planet's radius, most of which comes 
from uncertainty in the radius of the host star. 
If this does not make for a sufficiently smooth 
distribution, then the Oi can be multiplied by 



an additional factor to increase the smoothing. 

3. CALCULATING THE COOL KOI 
RADIUS FUNCTION 

One of the biggest concerns to date about 
interpreting Kepler data is uncertainty about 
stellar parameters. This applies both because 
the properties of the transit host stars are un- 
known (derived planet radius depends directly 
on the radius of the host star) and because the 
properties of the stars in the survey parent sam- 
ple are unknown (i.e. is Kepler actually survey- 
ing dwarf stars or is the parent sample signif- 
i cantly contamina ted by giants or subgiants? 
flMann et aLl|2012| ». 

.Focusing on Kepler candidates around rela- 
tively low-mass stars alleviates these concerns. 
Many of these stars have spectroscopically mea- 
sured stellar prope rties ( Muirhead et al. [2012a 
Mann et al. 2012), and in addition, the prop 



erties ot the parent sample of target stars has 
been carefully characterized phot ometrically by 
Dressing & Charbonneau (2013). Such an in- 
vestigation thus is narrower than attempting 
to use the whole Kepler sample, but the as- 
surance of a good understanding of the stellar 
parameters of both the host stars and the gen- 
eral survey sample more than compensates for 
this loss of generality. In addition, focusing on 
these "Cool KOIs" enables detailed study of the 
radius distribution of Earth-sized and smaller 
planets. 

To construct the planet radius function, we 
thus select the 113 planet candidates with peri- 
ods <90d identified in the cumulative KOI cat- 
alog posted at the NASA Exoplanet Archive 
that are hosted by stars with T e g < 4000K 
as cha racterized by pressing fe Charbonneau] 
(2013). To this sample we add the three KOI- 
9t)l /Kepler-42 planets, which were le ft out of 
the Dressing & Charbonneau (2013) sample 
because its broad- band colors are consistent 
with classification as either a giant or a dwarf, 
even though it has been spectroscop ically con- 
firmed to be a ^0.15 M dwarf (IMuirhead 



et al. 2012b). For stellar parameters we use 



the r e sults presented in |Dressing fc Charbon-| 



ncau 



(2013), except for those KOI host stars 
that have been spectroscopically characterized 
according t o the observations and procedures 
described in Muirhead et al. (2012b), for which 
we use the spectroscopic parameters. As this 



Implied all-planet period distributj/n 

(corrected for transit probability) 



Observed planets 




2.QR C 

1.5R t <R P <2.[)fi E | 
1.0R, <R p <1.5i?J 
0.5R <R B <1.0R I 



Period [days] 



Fig. 2. — The period distribution of planets around 
Kepler's M dwarfs. The grey shaded region is the im- 
plied period distribution of all planets combined, cor- 
recting for the effects of transit probability. The bar 
charts show the observed numbers of planets of differ- 
ent sizes in each period bin. Note the declining frac- 
tion of small planets as a function of period — this is 
most likely an effect of declining detection efficiency for 
smaller planets on longer-period orbits, and this must 
be properly accounted for when constructing the planet 
radius function. The radius function calculation in this 
paper assumes that all planets are distributed according 
to the shaded distribution, regardless of planet radius. 
See Sj5]for a discussion of this assumption. 

spectroscopic method is known to be unreli- 
able for T e ff > 3800 K we defer to the Dressing 
& Charbonneau (2013) parameters for stars in 
this temperature range. 

In the following subsections, we describe the 
steps necessary to calculate <^°, the estimate of 
the radius function for planets on orbits <90d, 
from this KOI sample. As described in Sj2] the 
crucial step toward properly estimating the ra- 
dius function is calculating the weight factor 
wi = 1/rji for each detection, which includes 
a transit probability factor and a completeness 
factor ?7disc,i (Equation [8]). Key to calculating 
?7disc,i is determining the SNR distribution of 
a hypothetical population of clones of planet 
i around all the target stars, or </>snr,i, which 
in turn requires an assumption of the intrinsic 
period distribution of planets <pp. 



3.1. Period distribution 

In order to estimate the shape of the true 
period distribution of planets of all sizes, we 
make the simplifying assumption that the pe- 
riod distribution of planets is independent of 



their radii (see §5^ for a discussion regarding this 
assumption). We thus construct the distribu- 
tion of log P from all the planet candidates in 
the sample, using an MKDE as described in 
§2.2| For the weights we use only the inverse 
transit probabilities, and enforce that the whole 
distribution is normalized to unity, creating the 
probability density function for log P. For the 
widths we use tj = 0.15 (in logP), to create a 
smooth distribution. This is the period distri- 
bution function <j)p that we use in the following 
subsection, shown as the grey shaded region in 
Figure [2] 

3.2. SNR distribution 

The SNR of a transit signal is usually defined 
as follows: 



SNR 



ptsi 



(11) 



where S is the transit depth, a is the one-point 
photometric uncertainty, 2Vtr is the number of 
transits observed, and iV p ts is the number of 
photometric points per transit. It can be shown 
that, for a fixed planet radius, SNR should scale 
with host star radius R+, orbital period P, time 
observed T bs, transit duration Td ur , and a as 
follows: 



SNRoc RI 2 P~^ 



T 2 T 2 n~ 



(12) 

where Td ur itself is a function of P, scaled semi- 
major axis a/R+, and impact parameter b. The 
putative SNR of a planet transplanted from its 
current configuration to a different period, im- 
pact parameter, and host star may thus be cal- 
culated by scaling the original SNR appropri- 
ately. 

For each Kepler target star, data on the 
photometric uncertainty a is available on a 
quarter-by-quarter basis, quantified by the 
"combined differential photometric precision" 
(CDPP) values on 3-hr, 6-hr, and 12-hr time 
intervals, which in principle should allow for 
calculation of the SNR for each transit sig- 
nal. However, the KOI catalogs also provide 
SNR for each identified planet candidate, and 
we find that using CDPP values and Equation 
[Tl] to calculate SNR does not reliably repro- 
duce the catalog values (it typically underesti- 
mates by about 30%, with significant scatter). 
And since the SNR ramp efficiency characteri- 
zation (ramp from at SNR = 6 to 1 at SNR 
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Fig. 3. — Two examples of the SNR distributions re- 
sulting from simulating the transit of a given planet 
around every target star, given randomly assigned pe- 
riods and impact parameters. The properties of this 
distribution depend on the properties of the detected 
system, and the integral of the pipeline detection ef- 
ficiency function over this distribution gives the "dis- 
covery fraction" J7disc KOI-961.01, a sub-Earth-sized 
planet in a very short o rbit around a very small star 
| |Muirhead e t al. 2012bJ, would have been detectable 
in only about l/B ol potential configurations, w hereas 
KOI-952,03 , a larger planet around a larger star ( |Swift| 
|et al,|2013| , would be detectable in almost any configu- 
ration, even though its actual SNR is smaller than that 
of KOI-961.01. 



= 12) was developed using the catalog SNR 
values, those are the SNRs we use. We do as- 
sume, however, that SNR still scales according 
to Equation fl2| 

And so, for each of the planet candidates i 
in the Cool KOI sample, we take its SNR as 
provided by the Exoplanet Archive and con- 
struct <?!>SNR,i by simulating the entire popula- 
tion of hypothetical alternative configurations: 
10,000 iterations of randomly chosen periods 
(acco rding to the distribution <f>p described in 
£3.1) and impact parameters (according to a 
uniform distribution from to 1) around each 
star in the target population, calculating the 
appropriate SNR for each i nsta nce according 
to the scalings in Equation [l2j As the total 
number of simulated SNRs is sufficiently large 
(~10 7 ), the smooth final shape of 0snr,i is 
defined by interpolating a histogram with bin 
width ASNR = 1. Figure [3] illustrates exam- 
ples of this distribution for two KOIs. 



Once </>snr,i is constructed for every planet, 
we then calculate ??disc,i for each planet accord- 
ing to Equation [8[ which combined with the 
transit probability gives the MKDE weight Wi 
for each planet, thus building (Equation 
10 1. This function is plotted as the solid black 
Ime in Figure |4j To estimate the variance of 
this density estimator, we perform 1000 differ- 
ent bootstrap resamplings of the true planet 
dataset and recreate the MKDE for each re- 
sampling. The la uncertainty region deter- 
mined by this procedure is illustrated as the 
grey shaded region in Figure [4j and is con- 
ceptually equivalent to a running Poisson er- 
ror bar. The widths Oi used to smooth the 
MKDE are taken to be twice the individual 
planet radius uncertainties, as this smooths out 
high-frequency wiggles while retaining broad 
features. Table [T] presents the data that goes 
into constructing this distribution. 

4. RESULTS 
The overall normalization of the radius func- 



tion shown in Figure 
approximately 1.5 p 



3.3. Radius Distribution 



; 4] indicates that there are 
lanets per cool star with 
periods <90d. In addition, there are several 
notable features of this distribution. The first 
is the peak between 1 and 1.5 i?© and the 
turnover below, both of which are robust fea- 
tures of the empirical distribution as character- 
ized by the bootstrap uncertainty analysis. If 
this feature continues to hold as more and more 
candidates are identified around cool stars, it 
would point to a dramatic feature of planet for- 
mation and evolution: ^1 i?® is the most com- 
mon planet size to survive long-term in short 
orbits around cool stars. This might be under- 
stood by an explanation similar to that pro- 
vided to explain the origins of the inner Sola r 
System ( |Goldreich et al.|2004||Chambers|20"oT] ): 
a large number of isolation-mass protoplanets 
form quickly, and once the gas and planetesi- 
mals disk dissipates, a period of dynamical in- 
stability follows, at the end of which typically 
only a few larger planets remain, the rest hav- 
ing been either destroyed (or merged) via colli- 
sions or been swallowed by the host star. It is 
certainly plausible that ~1 i?® planets might 
be the most likely outcome of this process, as 
this is precisely what has happened with the 
inner Solar System, with an outcome of two 
planets about the size of Earth. 
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1.5 planets per cool star 




0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 
Planet Radius [RJ 



Fig. 4. — The empirical radius distribution of planets orbiting M dwa rfs w ith periods <90 days (black continuous 
curve), estimated with a modified kernel density estimator (MKDE; see j ]3.3| l, with the bootstrap resampling-derived 
lcr uncertainty swath shaded grey — essentially a running poisson error bar. The detection efficiency as a function of 
signal-to-noise ratio has been quantified by an SNR ramp from at SNR = 6 to 1 at SNR = 12. The blue horizontal 
lines represent the standard "occurrence rate per bin" calculations for this sample. The vertical red lines represent 
the radii of individual planets in the sample, with their heights being proportional to the weight factors Wi. The 
green dotted curve is an R~ 2 power law, which bears a striking (and uncontrived) resemblance to the shape of this 
non-parametric radius function between about 1.25 and 2 R®\ below about 1.25 R&, the distribution appears to 
level off, and turn over below 1 Rq. Between about 2 and 2.5 R® there appears to be an excess over a smooth 
distribution; this may be caused by a significant significant population of planets with H/Hc atmospheres. There is 
an average of 1.5 planets per cool star in orbits <90 days over this radius range, and there is an average of greater 
than 0.5 planets per cool star in this period range for radii between 1 and 1.5 R&. 



The second notable feature of the distribu- 
tion is the plateau between about 2 and 2.5 R® 
and the steep decline above. This could plau- 
sibly be the effect of atmospheres, with plan- 
ets massive enough to retain primordial H/Hc 
envelopes showing up as an excess population 
of planets this size compared to what would 
be expected from extrapolating toward larger 
radii from ~1-1.5 i?®. One prediction of this 
hypothesis would be that most of the planets 
smaller than 2 i?® are on average more dense 
than planets between 2 and 3 i?®, pointing to 
a smoother underlying mass distribution. 

Finally, this distribution indicates that plan- 
ets larger than ^3 Rq are very rare around cool 
stars, consistent with the findings of RV sur- 



hot Jupiter identified around a star in this 



veys 1 


Endl et al. |2003 


Bonfil 


s et al. 


2013). 



samp le (KOI-254b/Kepler-45b Johnson et al. 
2012[) and another recent discovery of note ( j l'ri 



one 



aud et al. |2013 ) , but such planets are clearly 
exceptional — the vast majority of close-in plan- 
ets around cool sta rs are smaller than ~3 Rm . 
Even Gliese 1214b ( |Charbonneau et al.|[2009| , 
by far the best-studied planet around an M 
dwarf to date, appears to be an exception to 
the typical system, as its radius of 2.7 i?® falls 
far down the tail of this distribution. In fact, 
there are ~30x more planets smaller than Gl 
1214b than there are larger than Gl 1214b — 
this bodes very well for the future of ground- 
based surveys, both transit and RV, as they can 
become more sensitive to smaller planets. 

To explore the degree to which the SNR ramp 
and the period redistribution affect the shape 
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of the derived radius function, we repeat this 
analysis using only the Q1-Q6 KOI catalog 
( Batalha et al.|[2012 ) and out to a period of 50 
days, to match with the analysis of Pressing fc| 
Charbonneau ( 2013 ). We then compare our full 
analysis to using a strict SNR = 7.1 detection 
threshold (i.e. j]snr = 1 above SNR = 7.1 and 
Vsnr = below), and also to using an alterna- 
tive construction of <^>snr,i where the period of 
planet i is kept hxed. Figure [5] illustrates these 
alternative estimates of the radius function. 
Method (2), illustra ted by the dashed line , mir- 
rors the analysis of Fressin et al. (2013), who 
corrected detections for the SINK ramp effect, 
but not for the period distribution. Method 
(3), illustra ted by the dotted line, mir r ors th e 
analysis of Dressing & Charbonneau (20131, 
who used an SINK = 7.1 threshold and did not 
correct for orbital period. We show that these 
corrections make for a nearly 50% increase in 
the total inferred number of planets/star for 
P < 50d and, notably about a factor of two in- 
crease in the number of planets smaller than 1.4 
i?^. From this comparison, we estimate that 
the the true mean number of Earth-sized (0.5- 
1.4 i?®) planets in the habitable zones (HZs) of 
these cool stars is at lea st twice as high as the 
lower limit estimated by Pressing fc Charbon- 



ncau 



( |2013| ); that is, probably closer to ~0.30, 
rather tha n 0.15. Using rev ised calculations 
of the HZ, Kopparapu (2013) calculates a rate 
of ^0.50 habitable Earth-sized planets around 
cool stars; the completeness considerations in 
this paper should increase that estimate to ~1 
planet/star. 

Figures H] and [5] also display the results of 
these calculations in the more traditional for- 
mat of a histogram of planet occurrence in dif- 
ferent radius bins. In order to calculate these 
histograms, we simply add up all the weights in 
each of the radius bins. Figure [3] uses linearly 
spaced bins; Figure [5] uses the same logar i thmic 
bins used by Dressing & Charbonneau (2013) 
for comparison! There are several qualitative 
points to note regarding these histograms. The 
first is that they can be visually deceptive: for 
example, the dotted histogram in Figure [5] is 
approximately flat between 1 and 2 i?© before 
decreasing in the 0.7-1 i?® bin, even though the 
smoothed distributions continue rising steadily 
all the way to ^1.25 i?© — a result of logarith- 
mically spaced bins. Secondly, they only pro- 



(1) This work (SNR ramp + period correction) 
(1.4 planets/star) 

(2) SNR ramp, no period correction 
(1.2 planets/star) 

(3) 7.1 threshold, no period correction 
(0.9 planets/star) 




Occurrence rates in bins used by 
Dressing & Charbonneau (2013) 



1.5 2.0 2.5 

Planet Radius IRJ 

Fig. 5. — Th e planet radiu s dis tribution for P < 50 
days, using the |Batalha et ah| ( |2012| l catalog in order to 
compare to previous studies, demonstrating the effect 
of the corrections accounted for in this work. The con- 
tinuous curves are the non-parametric empirical density 
estimates, and the horizontal blue lines are t he occur- 
rence rates per bin, us ing the same bins as |Dressing| 
|fc Charbonneau] ( |2013[| . The vertical red lines repre- 
sent the radii ol individual planets in the sample, with 
their heights being proportional to the weight factors 
Wi. The non-solid linestyles represent different analy- 
sis methods. Whereas Method 1 (solid lines) uses the 
full analysis described in this paper , with detection effi- 
ciency described by an SNR ramp QFressin et al.|2013[ l 
and 0snr i constructed by assigning random periods 
(j ]3.2|l , Methods 2 (dashed) and 3 (dotted) both keep pe- 
riod fixed when constructing 0snr i > an <i Method 3 uses 
an SNR = 7.1 detection threshold rather than the SNR 
ra mp. Method 2 i s simi lar to the occurrence calculation 
in IFressin et al.l 12013I) and Method 3 uses the meth- 
ods employed mllJressmg &; Charbonneaul (I2013J. The 
importance of both a well-characterized detection effi- 
ciency function and treating period-based incomplete- 
ness correctly is clear: incorporating both these consid- 
erations significantly changes both the qualitative shape 
(especially as visualized with histograms) and normal- 
izat ion of the radius function below 2 ■ In particular, 
the [Dressing & Charbonneau (2013) analysis underes- 
timates the occurrence rates of planets between 0.5 and 
1.4 R(g by about a factor of two. 

vide a very coarse description of the shape of 
the radius distribution; that is, there is much 
more detectable structure than can be captured 
in a few bins — for example, the plateau in Fig- 
ure HI between 2 and 2.5 R® is not captured in 
the histogram illustration, nor is the steep fall- 
off above 2.5 i?©, nor the identification of ^1.25 
i?0 as the location of the low-end turnover. 

5. EXPLORING ASSUMPTIONS 

There are two assumptions that we have 
made to construct this radius distribution: 
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TABLE 1 
Data Used in Radius MKDE 



KOI 


R n \Rm\ 

C P L fcPj 


(7jj 


Ptt 


n 

/disc 


W l 


KOI961.03 a 


0.57 


0.18 


0.051 


0.14 


140.1 


KOI2453.01 a 


0.63 


0.11 


0.077 


0.12 


108.2 


KOI2542.01 a 


0.63 


0.08 


0.132 


0.09 


84.2 


KOI1422.03 a 


0.67 


0.11 


0.051 


0.23 


85.3 


KOI961.02 a 


0.73 


0.20 


0.132 


0.19 


39.9 


KOI251.02 b 


0.78 


0.09 


0.045 


0.55 


40.4 


KOI961.01 a 


0.78 


0.22 


0.068 


0.21 


70.0 


KOI952.05 a 


0.82 


0.08 


0.180 


0.44 


12.6 


KOI1843.02 a 


0.83 


0.12 


0.044 


0.33 


68.9 


KOI2006.01 a 


0.84 


0.07 


0.067 


0.33 


45.2 


KOI2238.01 a 


0.84 


0.13 


0.093 


0.21 


51.2 


KOI1146.01 a 


0.88 


0.18 


0.034 


0.33 


89.1 


KOI1702.01 a 


0.88 


0.13 


0.076 


0.27 


48.7 


KOI250.03 b 


0.88 


0.19 


0.059 


0.69 


24.6 


11012662.01 


0.92 


0.22 


0.085 


0.33 


35.7 


KO12036.02 


0.96 


0.16 


0.047 


0.56 


38.0 


KO12306.01 


0.97 


0.08 


0.241 


0.51 


8.1 


KOI1649.01 b 


0.98 


0.12 


0.057 


0.52 


33.7 


KOI255.01 a 


2.57 


0.09 


0.016 


0.92 


67.9 


KOI2926.01 b 


2.57 


0.29 


0.029 


0.90 


38.3 


KOI248.01 b 


2.69 


0.31 


0.041 


0.91 


26.8 


KOI531.01 b 


2.78 


0.40 


0.062 


0.96 


16.8 


KOI2156.01 a 


2.81 


0.20 


0.068 


0.89 


16.5 


KOI781.01 a 


2.82 


0.10 


0.028 


0.91 


39.2 



a Planet radius based o n spectroscopic stellar p aramc- 
ters from the analysis of Muirhcad ct al. (2012a) 
b Spectroscopic stellar characterization not available, or 
T e jj > 3800, so planet radius based on stellar parameters 
from [Dressing fc~ Ciharbonncau ( 2013| ) 

• The period distribution of planets is both 
independent of planet radius and well- 
characterized by the current planet de- 
tections. 

• The detection e fficien cy of planets in the 
(20121 catalog follows an 



Batalha et al. 

SINK, ramp s i milar to that described in 



Fressin et al. (2013) 



Are these assumptions justified? What are the 
implications if they are incorrect? 

5.1. Period Distribution Assumption 

Figure [T] illustrates very clearly why the de- 
tected population of small planets in the Cool 
KOI sample is indeed very likely incomplete, 
showing that where the detected period distri- 
bution of the smallest of the Cool KOIs drops 
off is right around the periods where the known 
short-period small KOIs would have become 
undetectable. This is the motivation behind 



the period redistribution procedure we use to 
calculate ?7disc.j m ^ — correcting for the un- 
detectable longer-period small planets. Such 
a correction is surely needed; however, the na- 
ture of this correction as applied in this work — 
using the implied all-planet period distribution 
for each planet — merits some discussion. 

There are certainly both physical reasons and 
observational suggestions to believe that the 
planet period distribution is not compl etely in- 
depen d ent o f radius. In particular, Howard| 
et al. ( 2012[ ) finds (shown in their Figure 6) 



that the fraction of short-period planets that 
are large (4-8 i?®) is smaller than the fraction 
of longer-period planets that are large; in other 
words, the period distribution of larger plan- 
ets decreases (heading towards shorter periods) 
sooner than does the distribut i on of smaller 
planets (2-4 R B ). |Dong fc Zhu| ( |2012[ ) present 
a similar finding. While there is not yet com- 
pelling evidence that this same effect has been 
detected for planets smaller than 2 i?®, simple 
physical consi derations such as incr easing stel- 
lar insolation ( Weiss fc Marcy|2013 1 might rea- 
sonably contribute to a dearth of larger plan- 
ets on short-period orbits. However, as there is 
no corresponding clear physical explanation for 
the absence of smaller planets in longer orbits, 
it is reasonable to assume that they do in fact 
exist, and that their period distribution might 
resemble the period distribution of the larger 
planets that are detected in such orbits. 

In addition, if small planets are in general 
more common than larger planets (as it ap- 
pears), and small planets are not being de- 
tected on longer periods, then approximating 
the distribution of all planets with just the total 
observed distribution will naturally underesti- 
mate the total numbers of longer-period planets. 
In fact, looking at the all-planet distribution in 
Figure [2j it is quite reasonable to expect that 
perhaps the apparent decrease of the distribu- 
tion function longer than ^20 days is actually 
due to the fact that only planets larger than 1.5 
i?® or so (which may very well be a minority of 
all planets) are being readily detected at these 
periods. And so, this will cause an overesti- 
mate of the true fraction of large planets that 
are at shorter periods, and an underestimate of 
the true numbers of small planets that are at 
longer periods. 

The total effect of this assumption will thus 
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be to systematically shift the simulated "planet 
clone" distributions towards shorter periods, 
and thus the SNR distributions </>snr,i toward 
larger SNRs. This will cause 77disc,i values to be 
slightly overestimated, which will lead to un- 
derestimating the weights Wi and subsequently 
the normalization of the radius function, espe- 
cially towards smaller planets, which depend 
most heavily on this correction. However, this 
effect is a very small one — we repeated the anal- 
ysis using a distribution in log P that has an ex- 
ponential cutoff below 10 days and is flat above 
(that is, without the dip beyond ~15 days that 
appears in Figure [2]), and there is negligible dif- 
ference in the ensuing radius distribution, with 
the only change being a few percent increase 
in the planet occurrence in the 1-1.5 ii® bin in 
Figure [4] — and notably no change to the 0.5-1 
i?® bin. 

5.2. SNR Ramp Assumption 

Figure [5] shows that quantifying the detec- 
tion efficiency of the Kepler pipeline as an 
SNR ramp (Equation [7| rather than a strict 
threshold cut makes a significant difference in 
the inferred occurrence rate of planets smaller 
than 2 i?®. In particular, the value of SNR 
by which the pipeline is assumed to be 100% 
complete changes the overall normalization of 
the low end of the radius distribution. While 
the 6-16 SNR r amp characteriz a tion o f the ef- 
ficiency for the Batalha et al. ( 2012) catalog 
was in troduced and defended by TFressin et al.| 
(2013), and was adjusted in this work (Fig- 



ure |4) to be relevant to the Q1-Q8 catalog 
(F. Fressin, priv. comm.), it should be treated 
as a temporary solution until the detection ef- 
ficiency of the Kepler pipeline can be directly 
quantified as a function of SNR through injec- 
tion/recovery simulations. However, the no- 
table features in Figures |4]andj5]— the rise down 
to about 1 i?® and the smallplateau around 
2.5 i?® — are robust to the precise details of the 
ramp. 

6. CONCLUSIONS 

We present a simple non-parametric method 
of analyzing the empirical shape of the planet 
radius distribution from a transit survey — the 
modified kernel density estimator, or MKDE. 
This estimator is similar to a standard kernel 
density estimator, except that its overall nor- 



malization is constructed to be equal to the to- 
tal number of planets per star, and that each 
data point is weighted according to its inverse 
detection efficiency. We also show that prop- 
erly computing this efficiency requires two con- 
siderations that have not always been applied 
in previous occurrence rate studies: correcting 
for the planet period distribution when calcu- 
lating how many target stars around which a 
particular planet could have been observed, and 
considering that the detection efficiency is a ris- 
ing function of signal-to-noise ratio, and not 
just a strict cutoff. 

Applying this analysis to the 113 planet can- 
didates currently in the cumulative KOI cat- 
alog with periods less than 90 days discov- 
ered around the cool Ke pler targets photomet- 
rically characterized by |Dressing k, Charbon-| 
neau| ( 2013 ) , we identify several key features of 



the radius distribution of small planets around 
small stars which invite theoretical explana- 
tion. First, even correcting carefully for in- 
completeness, the data indicate a flattening or 
turnover of the distribution around about 1- 
1.25 i?®, suggesting that planets about this size 
are the most common to survive in short or- 
bits around cool stars. Notably, this feature 
of the distribution is robust to incompleteness 
below 1 i?®: it appears that planets smaller 
than Earth are indeed more rare around cool 
stars than planets around the size of Earth. 
Secondly, there appears to be a plateau from 
about 2 to 2.5 i?®, where there is an overabun- 
dance of planets as compared to what would 
be predicted from a smooth distribution ex- 
trapolating from small to larger planet radii, 
perhaps an indication of a population of plan- 
ets with significant H/He atmospheres. And fi- 
nally, the occurrence pattern of planets around 
cool stars indicates that there are many planets 
just beyond the detection threshold of ground- 
based surveys, as planets larger than Gl 1214b 
(2.7 i?®) are ~30x rarer than planets with 
Rp < 2.7-R®. 

Comparing this non-parametric radius func- 
tion estimate with the more traditional presen- 
tation of planet occurrence rate in different ra- 
dius bins demonstrates that the bin presenta- 
tion can be visually misleading, in addition to 
missing details of the distribution that are ac- 
cessible in the data. And comparing our re- 



sults to the occurrence calculations of Dressing 
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& Charbonneau (2013), we find that there are 
about a factor of two more planets from 0.5 
to 1.4 than that analysis determined; this 
would imply that there are an average of ~0.30 
habitable-zone Earth-like planets per cool star, 
rather than the ~0.15 estimated by that work. 
And if this same correction is made to the cal- 



culations of |Kopparapu| (|2013j) , which use up- 
dated HZ calculations but the same occurrence 



formalism as |Dressing fc Charbonneau (20131, 
than this number would become closer to ~1 
planet per star. Habitable-zone, Earth-sized 
planets abound throughout the Galaxy in num- 
bers even larger than previously estimated. 

In addition to demonstrating how to extract 
empirical distributions from Kepler data with- 
out relying on arbitrary binning, we call atten- 
tion to the importance of understanding in de- 
tail the detection efficiency of transit search al- 
gorithms. Future studies can most directly sup- 
port analyses such as these — which lie at the 
very core of the Kepler mission — by directly 
computing the detection efficiency of these al- 
gorithms as a function of signal-to-noise ra- 
tio, enabling reliable correction for incomplete- 
ness near the detection threshold, where many 
of the most scientifically interesting discoveries 
will be made. And finally, we emphasize that 



this calculation is based on a target sample of 
only about 3900 cool stars and a KOI search 
only through Q8 data. Continued expansion of 
the cool star Kepler sample by re-appropriation 
of target pixels could potentially increase this 
sample size by a factor of two or more, allow- 
ing for greatly strengthened conclusions from 
the small-planet radius distribution and giving 
a greater handle on the formation processes of 
planetary systems around the most numerous 
stars in the Galaxy. In addition, careful appli- 
cation of these same principles to the entire Ke- 
pler dataset, as permitted by accurate knowl- 
edge of stellar parameters, will continue to un- 
cover important clues to the formation and evo- 
lution of all types of planetary systems. 
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